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[57] ABSTRACT 

A method and apparatus for detecting cuts in a digital video 
signal made up of a series of video images is disclosed. The 
method comprises segmenting a plurality of the video 
images into a number of cells, each comprised of a number 
of pixels having a pixel intensity value. Then, a plurality of 
cell contrast vectors each associated with one of the seg- 
mented video images is generated. An element of the cell 
contrast vector comprises the standard deviation of the pixel 
intensity values for the pixels in a particular cell. A cut detect 
signal is generated for a video image in response to the cell 
contrast vector associated with that image. 
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METHOD AND SYSTEM FOR DETECTING using computer hardware, computer software, or a oorabi- 

CUTS IN A VIDEO SIGNAL nation of both. Unfortunately, existing techniques use global 

metrics which focus on each individual video image as a 

CROSS-REFERENCE TO RELATED whole to determine image to image (often field to field) 

APPLICATIONS s similarity. These techniques are not as accurate as is desir- 

This application is related to U.S. application Ser. No. able «*« use * e usc of » e Sl«* local 

08/660.641. filed on Jun. 7. 1996 by E. North Coleman. Jr. 'P*^ jnformatl ? n . conUune f m avldeo um « e - Moreover, 

and entitled "Method and System for Detecting Transitional «*" "J^S techniques make various n^suremcnte of the 

Markers Such As Uniform Fields In A Video Signal." color coMponents of a videoimage. These techniques 

io are not easily adapted to process bom black and white and 

„. . . , TO ... o vt color video recordings. 

This application is related to U.S. application Ser. No. 

08/660.257, filed on Jun. 7. 1996 by E. North Coleman. Jr. SUMMARY OF THE INVENTION 

and entitled "Method and System for Detecting Dissolve The invention comprises a method and system for detect- 

Transitions In A Video Signal," pending. l5 ing a cut in a digital video signal. The invention employs a 

This application is related to U.S. application Ser. No. technique that segments video images into a series of cells 

08/660,564, filed on Jun. 7. 1996 by E. North Coleman. Jr. so as to retain spatial information to achieve greater accu- 

and entitled "Method and System for Detecting Fade Tran- racy in predicting a cut in a video signal. In accordance with 

sitioos In A Video Signal." pending. the method of the invention, a cut is detected in a digital 

This application is related to U.S. application Ser. No. 20 video signal made up of a series of video images. A plurality 

08/660.292, filed on Jun. 7. 1996 by E. North Coleman. Jr. of the video images are segmented into a number of cells, 

and entitled "Method and System for Detecting The Type Of Each cel1 deludes a number of pixels each having a pixel 

Production Media Used To Produce A Video Signal." pend- intensity value which represents the intensity of that pixeL 

| ns A plurality of cell contrast vectors each associated with one 

T*ese applications have all been assigned to Electronic 25 off *e segmented video images is generated. Each element 

rw« c*,+*lZ* «f di™ r|-. ^ of the cell contrast vector is associated with one of the cells 

Data Systems Corp. of Piano. Tex. ^ comprises ^ standar<J ^ vmon of ^ pixcl ^tensity 

TECHNICAL FIELD OF THE INVENTION values for the pixels in that particular cell A cut detect signal 

is generated for a video image in response to the cell contrast 

This invention relates generally to video signal processing vectQr assoc iated with that image, 

and more particularly to a method and system for detecting ^ hmMm has ^cnl important technical advantages, 

cuts in a video signal. The invention can be used to locate the exact point in time 

BACKGROUND OF THE INVENTION at wn ich a cut occurs in a video signal. This allows the 

owner of a video recording to easily index the recording so 

A typical television commercial, television program, or J5 as t0 te aWc to quidjjy locate the point in flic recording at 

movie comprises a series of video clips pieced together. For which each cut occurs to allow a viewer of the recording to 

example, if a scene in a television program is being filmed j ump to ^ tat point, in other words, the invention allows 

by cameras at three different locations in a room, that intelligent random access to the recording. The invention 

particular scene may include a series of video clips wherein allows the owner of a video recording to automatically 

each of the clips was originally recorded by one of the three w archive the video recording because the invention can accu- 

cameras. A particular video clip is normally separated from rately identify the location of cuts and create a list of those 

an adjacent video clip using a common video transitional locations for archival of video clips. Such automated archi- 

marker such as a cut. dissolve, or fade. Blank or uniform val aji ow easy random access type retrieval of a specific 

fields may also be used to provide visual separation between ^^0 clip. Editing of the recording at a later time may be 

video clips. 45 simplified by this feature. 

As digital storage becomes more economical, owners of The invention achieves high accuracy in predicting cuts in 

rights to video recordings have begun to digitally archive video signals as the invention captures a coarse level of 

those recordings. Digital archiving allows video owners to spatial information about individual video images in a 

easily preserve old video recordings that are in danger of digital video signal This information is used in the disclosed 

deterioration or destruction. Digital archiving also allows ^ me thod of identifying cuts. The use of spatial information 

video owners to separate recordings into individual clips for allows more accurate identification of cuts in digital video 

marketing purposes. For example, a clip from a television recordings. Because the invention uses a coarse level of 

program or a movie might be used in a television commer- spatial information, the invention has higher noise immunity 

rial or in an advertisement placed on the Internet Also, ^han some existing techniques of identifying cuts, 

individual video clips might be incorporated into multime- 35 nFSraiPTION OF THF DRAWINGS 

dia software. Television news organizations may more easily BRIEF DESCRIPTION Or THE DRAwlNUb 

share digital video recordings mat have been divided up into For a more complete understanding of the present inven- 

individual video clips. tion and the advantages thereof, reference is now made to 

Separating digitized video recordings into individual the following descriptions taken in conjunction with the 

video clips can be a costly process. Initially, separation of 60 accompanying drawings in which: 

digitized recordings into individual video clips was per- FIG. 1 illustrates a system constructed in accordance with 

formed manually. An operator of specialized equipment the invention for identifying transitional markers in a digital 

and/or software would manually locate the various transi- video signal; 

tional markers in the digitized video recording and record FIG. 2 illustrates an exemplary diagram of the video 

the position of those transitional markers. 55 transition application of FIG. 1; 

Techniques have also been developed to automatically FIG. 3 illustrates the segmentation of a single example 

identify transitional markers in digitized video recordings video image of the digital video signal; 
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FIG. 4 illustrates a flow chart of a method of detecting 
cuts in a digital video signal in accordance with the inven- 
tion; 

FIG. 5 illustrates a flow chart of a method for detecting 
blank and/or uniform video images in a digital video signal 
in accordance with the invention; 

FIG. 6 illustrates a flow chart of a method for detecting 
fade transitions in a digital video signal in accordance with 
the invention; 

FIG. 7 illustrates a flow chart of a method for detecting 
dissolve transitions in a digital video signal in accordance 
with the invention; and 

FIG. 8 illustrates a flow chart of a method for detecting the 
type of production media used to create a digital video signal 
in accordance with the invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The preferred embodiment of the present invention and its 
advantages are best understood by referring to FIGS. 1-3 of 
the drawings, like numerals being used for like and corre- 
sponding parts of the various drawings. 

FIG. 1 illustrates a video event detection system 10 that 
comprises one embodiment of the present invention. In this 
embodiment, video event detection system 10 comprises 
computer software running on a general purpose computer 
known as a Sun SPARC workstation. Video event detection 
system 10 may be adapted to execute any of the well known 
MSDOS. PCDOS. OS2. UNIX. Motif. MAC-OS™. 
X-WINDOWS™, or WINDOWS™ operating systems, or 
other operating systems. Video event detection system 10 
comprises processor 12. input device 14, display 16, 
memory 18 and disk drive 20. The present invention 
includes computer software that may be stored in memory 
18 or on disk drive 20 and is executed by processor 12. Disk 
drive 20 may include a variety of types of storage media 
such as. for example, floppy disk drives, hard disk drives. 
CD ROM disk drives, or magnetic tape drives. Data may be 
received from a user of video event detection system 10 
using a keyboard or any other type of input device 14. Data 
may be output to a user of video event detection system 10 
through display 16. or any other type of output device. 

Video event detection system 10 includes video transition 
application 22 which is a computer software program. In 
FIG. 1. video transition application 22 is illustrated as being 
stored in memory 18 where it can be executed by processor 
12. Video transition application 22 may also be stored in disk 
drives 20. Video transition application 22 processes digital 
video signals and identifies various transitional events 
occurring in the digital video signals. In this example, a 
digital video signal 24 is stored on disk drives 20. 
Alternatively, video event detection system 10 could receive 
an analog video signal from an external source, digitize that 
video signal and store it on disk drives 20 or in memory 18, 
A digital video signal could also be received from an 
external source. The operation of video transition applica- 
tion 22 will now be described in connection with FIGS. 2-8. 

FIG. 2 illustrates a block diagram of video transition 
application 22 which is constructed in accordance with the 
invention. As shown, video transition application 22 com- 
prises low level processor 26. which outputs data to mid- 
level processor 28. The output of mid-level processor 28 is 
provided to cut detector 50, blank/uniform image detector 
32. fade detector 34. dissolve detector 36. and media detec- 
tor 38. The output could be provided to other detectors or a 
subset of these detectors. The outputs of each of the detec- 



4 

tors are provided to event resolver 40. In this embodiment, 
each of these components of video transition application 22 
comprise computer software. All or a portion of these 
functions could also be performed using hardware. In 
5 addition, although the functions of video transition applica- 
tion 22 have been divided among several software routines, 
the structure of video transition application 22 could be 
changed without departing from the scope of the invention. 
Before discussing the operation of each component of 
10 video transition application 22. a brief overview of the 
operation of video transition application 22 is appropriate. 
Low level processor 26 receives a video signal comprising 
a series of video images and segments each video image in 
a digital video signal into a plurality of cells. Each cell 
15 includes a number of pixels which are each, in turn, asso- 
ciated with a pixel intensity value. Low level processor 26 
generates a cell contrast vector and a cell intensity vector for 
each segmented video image. The components of the cell 
intensity vector are each associated with one of the cells of 
20 the segmented video image and comprise the average pixel 
intensity value for pixels in that cell. Each element of the cell 
contrast vector is associated with one of the cells in the 
segmented video image and comprises the standard devia- 
tion of the pixel intensity values for pixels in that cell, 
y Although each contrast vector element is correlated to the 
contrast of a particular cell, it is not a measure of the 
contrast. 

Low level processor 26 computes the cell contrast and cell 
intensity vectors for each segmented video image and passes 
30 those vectors to midVlevel processor 28. Based upon the cell 
contrast vector and cell intensity vectors, mid-level proces- 
sor 28 computes an inter-image similarity value, a maximum 
cell contrast value, a maximum cell intensity value, a 
contrast change vector, and an intensity change vector for 
35 each segmented video image. 

The inter-image similarity value for a particular video 
image comprises the cosine of the angle between the cell 
contrast vector for that video image and the cell contrast 
vector for another video image — in this embodiment, the 
40 immediately prior field of the digital video signal. The 
maximum cell contrast value for a particular video image 
comprises the largest component of the cell contrast vector 
for that image. Again, the term contrast refers to a value 
correlated to the contrast of a particular cell comprising the 
45 standard deviation of the pixel intensity values for that cell 
Similarly, the maximum cell intensity for a particular image 
comprises the largest component of the cell intensity vector 
for that image. The contrast change vector for a particular 
image comprises the cell contrast vector for that image 
50 minus the cell contrast vector for another video image — in 
this embodiment, the same field in the immediately prior 
frame of the digital video signal. Similarly, the intensity 
change vector for a particular video image comprises the 
difference between the cell intensity vector for that video 
55 image and the cell intensity vector for another video 
image — in this embodiment, the same field in the immedi- 
ately prior frame of the digital video signal. 

After computing these values and vectors, mid-level pro- 
cessor 28 outputs them to cut detector 30, blank/uniform 
60 image detector 32. fade detector 34. dissolve detector 36 and 
media detector 38. Only the values and vectors used by each 
particular detector are provided to that detector. The opera- 
tion of each of the detectors 30-38 will be explained more 
fully in connection with FIGS. 4-8. Each of the detectors 
65 30-38 detects various transitional markers within the digital 
video signal. When a transitional marker has been detected, 
the appropriate detector 30-38 generates an event which is 



03/11/2004, EAST Version: 1.4.1 



m = 0, 1 8 

n = 0, 1 6 



5.767.923 

5 6 

passed to event resolver 40. Event resolver 40 processes the embodiment the ceil intensity vector for a particular seg- 
transitional markers and generates a timeline annotating the mented video image has an element for each cell in the 
point at which each particular transitional event occurred. In segmented video image. Each element is associated with a 
addition, event resolver 40 resolves conflicts between events particular ceil of the segmented video image and compn^s 
based upon a priority scheme so as to filter out overlapping 5 *e mean of the pixel intensity values for each pixel in that 
t** j rr ^ yhe intensity vector, u*, can be calculated using For- 
evc . , mula (1) where the value p specifies the linear index of a 
The operation of each component of video transition cdrs meaD va i uc g i VCD cell indexes m and n. In 
application 22 will now be described in more detail. Low ^ embodimexxt. the entry p=0 in each feature vector 
level processor 26 processes a digital video signal. At some corresponds to the upper lefthand cell of segmented video 
prior time, a digital video signal was received by video event 10 image 42, and the entry p=62 corresponds to the lowa- 
detection system 10, Alternatively, a raw video signal may right-hand cell of segmented video image 42. 
have been acquired through analog-to-digital conversion 

performed by a video frame grabber. The raw video may be x 3 1 3 1 (O 

a three channel signal, either red-green-blue (RGB) or ^'ly £o£> /,f32m + 16 + I ' J2n * 8+/1 

luminance-chromanance blue-chrominance red (YCbCr), or 15 
a single channel luminance signal. When a luminance signal 

is not directly available, as in the case of RGB, it is obtained n p Z ^ + m 
by software calculation. The National Television Systems 

Committee (NTSC) standard color primary to luminance Each element of the cell intensity vector, u*. thus provides 

signal conversion formula may be used to obtain a lumi- 20 a measure of a cell's image brightness. Some of the cells 

nance signal. In accordance with that formula, intensity for could be omitted from the vector or some of the pixels could 

a particular pixel equals 0.3 times the red value for that pixel * omitted from the mean calculation without departing 

plus 0.59times the green value for that pixel plus 0.11 times ^ * e of the invention. Ako, me elements of the 

Se blue value for mat pixel. An intensity (or luminance) ^ vector might be proportiorud to the average of 

value for eac^^^^^ » 

(each field m this embodiment) in the digital video signat ^ ^ ^ £ ^ ^ s ^ ^ 

Each NTSC video signal comprises a series of frames, each clcmcm ^ yector is associated witn one of 

frame further comprising two fields. In this embodiment ^ ^ of segment ed video image 42 and comprises the 

then, a video image comprises one field of a digital video standard deviation of the pixel intensity values for each pixel 

signal- in that cell. The cell contrast vector. o t . may be computed 

The intensity value for each pixel in a digitized video using Formula (2) as follows: 
image in this embodiment lies within the range between 0 

and 255 (8-bit resolution). The source luminance signal may -L (2) 

be down sampled by dropping values to reduce the scan line ^ = / |* « (/(I32m + 16 + - i2n + 8 +J] _ ^ \ 

pixel count by one-half so as to decrease subsequent image V f=o>o / 

processing times. m = 0, l 8 

Low level processor 26 segments each video image (each n = 0, 1 6 

field in this embodiment) into a number of cells. Each cell p = + m 

comprises a number of pixels. In this embodiment each ^ where mc value p svccmcs the linear index of a cell's 

video image is divided into 63 cells using a 9x7 grid. FIG. ^^si value given cell indexes m and n. Again, entry p=0 

3 illustrates a segmented video image 42 that has been in me ycctor corrcsponds to mc uppcr fcfthand cell 

segmented into 63 cells. Segmenting a video image in this of segmented Wdco imagt 42 and the value p=62 corre- 

way allows the capture of spatial information about each ^ tQ ^ lower righted cell of segmented video 

video image. This spatial information is useful in accurately 45 iMge42 . Each component of the cell contrast vector is used 

identifying transitional markers within the digital video asa measure of a particular cell's image contrast As with the 

signal cell intensity vector, some of the cells could be omitted from 

Each digital video image can be segmented into a finer or the vector or some of the pixels omitted from the standard 

coarser grid without departing from the teachings of the deviation calculation without departing from the scope of 

invention. If the video image is divided too coarsely, spatial 50 the invention. 

information is lost and transitional marker detection may Low level processor 26 outputs the cell contrast vector 
become less accurate. If the video image is segmented more and cell intensity vector for each segmented video image to 
finely, the noise immunity of the invention decreases, which mid-level processor 28. Mid-level processor. 28 then corn- 
can also affect the accuracy of transitional marker identifi- p Utes fi ve time domain values as functions of the cell 
cation. In this embodiment each video image comprises 55 contrast vector and cell intensity vector for each video 
320x240 pixels. Each cell of the segmented image com- image. Collectively, these values form five time domain 
prises a square of 32 pixels by 32 pixels. As illustrated in signals. These time domain signals comprise three scalar 
FIG. 3, an unused 16 pixel horizontal and 8 pixel vertical values and two vector values. The scalar values include the 
border appears along each side of the segmented video inter-image similarity value, maximum cell contrast value, 
image 42. These pixels are preferably unused as pixels lying 60 and maximum cell intensity value. The vector values include 
in the border regions may not reliably carry picture content the contrast change vector and intensity change vector, 
when scanned by video capture devices. Other values could be computed without departing from 

Each segmented video image, then, may be associated the scope of the invention, 

with an array of pixel intensity values, I*. This intensity The maximum cell intensity value, B*. for image number 

array comprises 320 rows by 240 columns in this embodi- 65 k comprises the largest element of the cell intensity vector 

ment and may be used to generate a cell intensity vector and for image number k and can be computed using the follow- 

a cell contrast vector for each video image. In this ing formula: 
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B^nax Mp](p*o. i 62} M). i am (3) FIG. 4 illustrates a method of detecting a cut in a digital 

The maximum cell contrast value, Q, for image number video signaL ^ metbod is Panned by cut detector 3d in 
k comprises the largest element of the cell contrast vector for vldco transition application 22. A cut is an abrupt transition 
that image and can be computed using the following for- ^ om one camcra 5001 to another. In a digitized NTSC video 
mula: 5 signal, this transition is seen as a significant change in the 

picture from field to field — for example, from field k-1 to 
field k. Its characteristic appearance to the viewer is a quick 
C^uix {aj,}M>, i, .... 62} feo, i P .... *-i (4) spatiotemporal change of scene or camera position. Cuts 

The inter-image similarity value. for image number k w 0fU ! D 00011 during a changc from oac sccnc to another and 

represents the cosine of the angle between the cell contrast <*unng shifts between different camera angles within a scene, 

vectors for adjacent video images (adjacent fields in this The method of FIG. 4 begins at step 44 with initialization 

embodiment). An angular distance measure was selected to and whatever preparation of the video signal is required. At 

compare similarity between images because angular dis- the conclusion of step 44. the process is prepared to operate 

tance is more tolerant of overall scene brightness changes 15 on a digital video signal. In step 46. a video image (in this 

than Euclidean distance measures. For example, if an actor embodiment, a field of an NTSC video signal) of a digital 

causes a sudden change in scene brightness such as by video signal is received by low level processor 26. Next in 

mnung a light on in a room during a scene in the video, a step 48, low level processor 26 segments the video image 

Euclidean distance measure may cause an erroneous transi- intD ^ md computes the cell contrast and cell intensity 

^i 0 * J^^^^^f^^by 20 vectors as described above. The cell contrast and ceU 
the inter-iraaee similarity value, S*, will function oroDeriv in _ , 

such a situation and JLs the number of false ZSZ ^JZ*? ^T^iSS " ^ 
marker detections. Inter-image sirnilarity value. S„ can be " * ^ in !** ^ processor 28 computes 
computed using the following formula: £f ^Jl^J^^^!^^^^ ChaDg ? VCCt0r 

25 



for the video image, as described above. These values are 
output to cut detector 30. 

(kaiotj)' *=2,3,...,/v-i In step 52, cut detector 30 analyzes the inter-image 

k _ 0 x similarity signal formed by the series of inter-image simi- 

larity values for a series of video images to identify char- 
Mid-level processor 28 also computes two vector 30 acteristic spikes induced in this signal by a cut in the digital 
values— the intensity change vector and contrast change video signal. A spike filter is applied to the inter-image 
vector. The intensity change vector is used to measure similarity signal in order to isolate spikes of a width no 
individual cell intensity changes while the contrast change greater than two video images (two fields in this 
vector is used to measure individual cell contrast changes. In embodiment). The sensitivity of the spike filter in terms of 
this enu>odiment these changes are measured by cell dif- 35 the width of the spikes isolated may vary depending upon 
fences between £cldsof the same polarity (even or odd) in mc ^ ^ mc s ^ ZhoFoftotZL 

adjacent frames. The intensity change vector, b*, for image c{m ,7n,. *„;t~ k 1 ♦ T " 

k can be computed using the following formula: 2i£5 JS-J*"* ^ " *TT 

^ & 6 due to motion in the video signal. The implemented spike 

{ filter preferably operates over a -10 to +10 video image 
wb)-w-2b) A = 2, . . . ,N~ 1 40 window centered about the video image under consider- 

0 * = °.i ation. Each videoimage is a field of an NTSC signal in this 

p = 0, 1, ... 62 enibc<liment The filtered spike size, g[k] for video image k 

is defined by the following formulas: 

The contrast change vector, c*. for image k can be calculated 

using the following formula: 45 c* = mm{S{k - m)\m = i>±2±3, . . . ,±10} (8) 

, <nM-OMlp] A=2,...,Ar-i CO g[k]=l S[k)<Gl ^ 



0 A = 0, 1 I 0 otherwise 

p«0,i,...62 50 for ± = 0,1 N-l 

The outputs of mid-level processor 28 are provided to where g[kl represents the distance that a given spike extends 

each of the detectors 30-38. The scalar and vector values below the minimum similarity value of a neighboring video 

computed by mid-level processor 28 for each individual image from within the filtering window, 

video image can be combined to form time domain signals. 55 Continuing in step 52, cut detector 30 also processes the 

The detectors 30-38 may then filter such signals to aid in contrast change vector to generate a contrast difference 

oetecting transitional markers in the video signal. In this value for a particular video image. The contrast difference 

ernbodiment, these mid-level signals reflect a sampling rate value, Gc, comprises a count of all elements of the contrast 

of 59.94 Hz, the standard NTSC field broadcast rate. This change vector associated with that video image that are 

sampling rate is used to measure event duration when 60 greater than a first contrast change value or less than a 

developing event models for each detector. If the sampling second contrast change value. In this embodiment. Gc 

rate changes, modification should also be made to the event indicates the count of cells having an inter-image contrast 

detection models. difference of a magnitude greater than or equal to three units 

The operation of each of the detectors 30-38 will now be of the standard deviation measure. Contrast differences 
described in connection with FIGS. 4-8. Each of the detec- 65 meeting or exceeding this value are considered to be pro- 
tors 30-38 conmrises a software process but could also duced by cuts. The contrast difference value, Gc. for video 
comprise specialized hardware. image k can be calculated using the following formulas: 
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(1 -3.0<cib]<3.0 (10) 
0 otherwise 

for p = 0, 1, . . . , 62 5 

Ge(A] = 63 -Z%\p\ <"> 
P 

Next, in step 54. cut detector 30 determines whether a cut 
has been detected for the video image under consideration. 10 
Cut detector 30 employs three independent tests to deter- 
mine whether a cut has been detected. First, if the filtered 
spike size value. g[k]> for video image k exceeds a threshold 
value (0.045 in this embodiment) and the ratio of the filtered 
spike size value* gfk]. to the difference between one and the 
inter-image similarity value. S*. exceeds a second threshold 
(0.6 in this embodiment) then a small spike cut event 
centered at video image k is detected. Second, if the filtered 
spike size value, gfk]. for video Image k exceeds another 
threshold (0.13 in this embodiment), then a large spike cut 
event centered at video image k is detected. Third, if the 20 
contrast difference value. Gcfk], for video image k exceeds 
another threshold (45 in this embodiment) and the difference 
between the contrast difference value for video image k and 
the maximum of the contrast difference values for a plurality 
of other video images in the neighborhood of video image k 25 
(in this embodiment — Gcfk-lJ and Gc[k+1]) exceeds 
another threshold (7 in this embodiment), then a cut event 
centered at video image k is detected. Any one of these 
occurrences may cause a cut to be detected in step 54. Other 
methods of detecting cuts can be used without departing 30 
from the scope of the invention. 

This method may in some instances detect a cut where no 
cut exists. Thus, an error check is applied in step 56 to 
determine whether a true cut has been detected. When a cut 
is detected, cut detector 30 determines whether the contrast 35 
difference value, Gc. for a particular image exceeds a 
threshold (25 in this embodiment). If this test is met. a cut 
Is recognized and cut detector 30 generates an event in step 
58 indicating that a cut has been detected in the digital video 
signal at video image k. Then, in step 60. if video image k 40 
is not the last image in the digital video signal, the process 
begins again at step 46. Otherwise, the procedure terminates 
in step 62. If no cut was detected in step 54 or if the error 
checking function revealed an erroneous cut detection in 
step 56. then the method proceeds to step 60 from either of 45 
steps 54, 56. The error checking step 56 could be omitted 
without departing from the scope of the invention. 

Although the illustrated embodiment utilizes the interim- 
age similarity value defined above based upon the cell 
contrast vector, a similar method could be used to detect cuts 50 
using an interimage similarity value based upon the cell 
intensity vector. In this alternative embodiment, the inter- 
image similarity value comprises the cosine of the angle 
between the cell intensity vector for the current image and 
the cell intensity vector for a prior image. Although different 55 
thresholds may be used for cut detection with this similarity 
value, the remaining steps of the method can be applied to 
detect cuts based upon the cell intensity vectors. 

FIG. 5 illustrates a method for detecting blank and/or 
uniform images in a digital video signal in accordance with 60 
the invention. This method is performed by blank/uniform 
image detector 32 in video transition application 22. A 
uniform image has a single tone or color appearing as a 
background. Uniform images are commonly found within 
program introductions and credits and within commercials. 65 
They often serve as a background for stationary and scrolled 
text. Within commercials, fade-to-white and fade-from- 



10 

white image transition sequences employ uniform white 
images. A blank image is a special case of a uniform image 
and comprises an all black image. Normally, blanking is part 
of a visual transition sequence where blank: images are 
inserted between cut and/or fade transitions. Blank images 
are also used when a pause is required to inform the viewer 
of a change of context, such as between commercials, or to 
mark a major change in location or time. When blanking is 
used to separate commercial and program segments, experi- 
mental data indicates that blanking times may vary signifi- 
cantly between one and eighty or mare fields. When blank- 
ing is used within a program or commercial segment, 
experimentally obtained blanking times are more consistent, 
normally ranging between four and sixteen fields. 

The procedure begins in step 64 with initialization and 
whatever preparation of the video signal is required. At the 
conclusion of step 64. a digital video signal is ready to be 
processed by video transition application 22. In step 66. a 
video image is received by low-level processor 26. Next, in 
step 68. low level processor 26 segments the video image 
Into cells and computes the cell contrast and cell intensity 
vectors as described above. Low level processor 26 outputs 
these vectors to mid-level processor 28. Then, in step 70. 
mid-level processor 28 computes the maximum cell inten- 
sity and maximum cell contrast values for the video image. 
Mid-level processor 28 then outputs these values to blank/ 
uniform image detector 32. 

In step 72. blank/uniform image detector 32 determines 
whether the video image is a uniform image. It does so by 
comparing the maximum cell contrast value to a threshold 
value (5.0 in this embodiment). If the maximum cell contrast 
is below the threshold, then a uniform image is detected and 
a uniform image event is generated in step 74. Then, in step 
76, blank/uniform image detector 32 compares the maxi- 
mum cell intensity to a second threshold (35 in this 
embodiment) and the maximum cell contrast to a third 
threshold (4.0 in this embodiment) and if both the maximum 
cell intensity and maximum cell contrast for the image under 
consideration are less than their respective thresholds, then 
a blank image is detected and a blank image event is 
generated in step 78. In an alternative embodiment, a blank 
image could be detected by comparing only the maximum 
cell intensity to a threshold. Following step 78. it is deter- 
mined in step 80 whether the image under consideration was 
the last image or not. If so, then the procedure terminates at 
step 82. If not, then the procedure processes the next digital 
image by returning to step 66. If a uniform image was not 
detected in step 72 or if a blank image was not detected in 
step 76. execution proceeds to step 80. Hie thresholds used 
in this embodiment may depend on the digitizer used to 
digitize the video signal and should be experimentally 
determined. 

FIG. 6 illustrates a method for detecting fades in a digital 
video signal in accordance with the invention. Fade detector 
34 of video transition application 22 may detect fades in a 
video signal using the method disclosed in FIG. 6. A 
fade-out transition moves the viewer of a video from a scene 
and camera shot to a uniform target image. Normally, the 
target image is a blank or a black field, but in some instances, 
the target image may be white. The fade-in. opposite in 
effect from the fade-out moves the viewer from a uniform 
black or white image to a new scene. A time-weighted 
average of the uniform image and the departing or arriving 
scene is used to generate fades. 

The duration of fade transitions is widely variable. Tran- 
sition data experimentally obtained included fades with 
lengths between eight and fifty fields with a modal value of 
eighteen and median value of twenty fields. Other data 
included fades as long as 200 fields. 
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Fade-out and fade-in transitions are often used in associated with a particular video image. A Gaussian filter 

combination, forming a transition sequence separating one with u=k, o=10. is applied to the absolute value of the fade 

scene from another. The new scene will often describe action signal, averaging totals over a 60 image (-3o to +3a) 

in a different location, at a different time, or of a different window centered about image k. Other filter types and sizes 

nature. In network broadcast programming, fade-to-black 5 could be used without departing from the scope of the 

transition sequences occur most frequently at the beginning invention. The filtered fade value. f[k] for a particular image 

and ending of program and commercial segments. Fade-to- ^ 0811 ^ computed using the following formula: 
white and fade-from-white sequences are normally used 

only within a program segment or within a commerciaJ spot ^ k]= jp lFSk _ 1 r ^.^w 0*) 

The method begins in step 84 with initialization and 10 N"UT 10 

ready to ^digital video signal. InT.p 86. a video ^er^ £ 

mage is received by low level processor 26. Then, in step dctcctcd for a video ^ k tf me m ^f^ 

88. low level processor 26 segments the image into cells and j 5 valuc fQr that image. f[k], exceeds a threshold value (15 in 

computes the cell contrast and cell intensity vectors as m embodiment). This embodiment of the invention, 

described above. Low level processor 26 outputs the cell however, will indicate fades only where image k comprises 

contrast and cell intensity vectors to mid-level processor 28. the center of a fade. The fade center is defined as the point 

In step 90. mid-level processor 28 computes the intensity where the fade value for image k. flk], exceeds the threshold 

change vector, rnaximiun cell contrast value, and maximum 20 value and where f[k]>f[k+l]. If the center of a fade is 

cell intensity value for the video image. These values are detected, then it is determined in step 96 whether or not the 

output to fade detector 34, fade event is truly a fade event Fade detector 34 determines 

Next, in step 92, fade detector 34 generates a fade value whether the fade event is truly a fade event for image k by 

for the video image being processed. Collectively, the fade determining the sign of the fade value. Flkj, for that image 

values for a series of video images create a fade signal. The 25 and by examining the maximum cell intensity ^nd maximum 

fade value, Fl k], for a video image comprises the difference cell contrast within a neighborhood of image k This process 

between the total number of elements of the intensity change also identifies the type of fade detected as described in Table 

vector, b^ for the video image that have a value between a 1. 
first lower limit and a first upper limit and the total number 

of elements of the intensity change vector, b*, for the video 30 TABLE 1 
image that have a value between a second lower limit and a 



second upper limit This is a histogram-type measure and Fade TVoe Codes as Derived from Signal Values 

uses mstogram ^ Histo^am intervals of 03 units are mpo W40 ^ 5 F ade_jn (from black) 

used in this embodiment It has been determined expenmen- F[tpo B M ,>=40 cv^os fade_out (to white) 

tally that fade events produce brightness differences detect- 35 F[*po B^^^o c^^c=5 fade_in (from white) 
able by this interval size. Smaller intervals are overly P[tpo b^ x <40 c^>5 fade- out (to black) 

sensitive to noise in the video signal. The brightness differ- — — — — — — ^— ^— — — 

ence histogram. Ho*, can be computed using the following As described in Table 1, if the fade value for image k is 
formulas: positive, the maximum cell intensity within a neighborhood 

40 of image k and image k-30 is less than a threshold (40 in this 

(1 03(m-i8)5fri[p)<o.3(m-i7) fl2 > embodiment) and the maximum cell contrast within a neigh- 

o others borhood of image k and image k+30 is greater than a 

threshold (5 in mis embodiment) then a fade-in from black 

m = °» 1 35 is detected. Similarly, with a positive fade value for image 

p = 0t i 62 4 5 k, a maximum cell intensity greater than a threshold in a 

negative neighborhood about image k. and a maximum cell 
Hbdm] = zi{m.p] (13) contrast less than a threshold in a positive neighborhood 

p about image k. a fade-out to white is detected. Different size 

The fade value, F[k), for image k in this einbodimeat neighborhoods or thresholds could be used without depart- 
comprises the total of the histogram cells between 03 and 50 to * from the scope of the invention. For a fade value for 
5.4 units minus the total of the histogram cells between -5.4 unagc k that is negative, a fade-in from white is detected 
and -0.3. Differences within this range reflect a gradual wherc maximum cell intensity within a positive neigh- 
decrease or increase in the overall cell brightness. Differ- borhood about image k exceeds a threshold and where the 
ences outside of this range were experimentally found to be ^ contrast for a negative neighborhood about image k is 
too large to be produced by fades. Other ranges could be 55 lcss ^ a toxoid. Finally, a fade-out to black is detected 
chosen without departing from the scope of the invention, wherc tn* fade value for image k is negative, the nmximum 
however. The fade value, F[k], for image k can be computed ccU in tcilsit y in * positive neighborhood about image k is 
using the following formula: less 10311 a threshold and the maximum cell contrast in a 

negative neighborhood about image k exceeds a threshold. 
35 16 (14) 60 none of these types of fades are detected, then it is 

F[k\= z^Hbiii]- ^Hb&q determined that a true fade does not exist and the method 

proceeds to step 100 where it is determined whether the 
Continuing with step 92, after the fade value. Ffk]. has current image is the last image in the digital video signal If 
been computed for a video image, then the fade signal so, the procedure terminates in step 102. If not then the next 
formed by the fade values is filtered to produce a filtered 65 digital image is processed beginning with step 86. 
fade signal. The filtered fade signal comprises a plurality of Returning to step 96. if a true fade was detected, then in 
filtered fade values wherein each filtered fade value is step 98. a fade event indicating the type of fade (in or out 
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and black or white) is generated. The procedure then con- embodiment. Experimentally recorded dissolve transitions 

tinues at step 100 as discussed above. Similarly, if no fade were found to produce contrast differences detectable by this 

was detected in step 94. the procedure continues in step 100. interval size. Other interval sizes could be used without 

Alternatively step 96 could be omitted and the type of departing from ^ scope of the invention. Contrast differ. 

5 - ^ « * ^ 

a fade-in from black would be indicated where the maxi- formulas: 
mum cell contrast in a negative neighborhood about image 

k is less than a threshold and a fade-out to white would be ( i o.5(m - 18) £ c^p] <o.5(m - 17) (16 > 

indicated where the maximum cell contrast in a positive 1Q xfap) = { 
neighborhood about image k is less than a threshold. 
Similarly, where the fade value for image k is less than zero. m = o, i, . . . , 35 

a fade-in from white event would be indicated where the 
maximum cell contrast in a negative neighborhood about 
image k is less than a threshold and a fade-out to black J5 /fctfm)=xxl«vl 0 7 > 
would be indicated where the maximum cell contrast in a p 

positive neighborhood about image k was less than a thresh- min|max(50.o - C* 0.0), 40.0| (18) 
old. pk-i-t- 5575 

Any of the thresholds and ranges for the neighborhoods 
can be changed without departing from the scope of the 2Q The first and second dissolve values are generated by 

invention. Various other methods of detecting false fades multiplying the contrast difference histogram. Hc A , within a 

could also be used without departing from the scope of the limited frequency range of cell contrast differences times p*. 

invention. a va i uc , ft 4 [ s a value which tends to increase 

FIG. 7 illustrates a method of detecting dissolve transi- Uve valucs m rcsponse t0 lower contrast maximums, 

tions in a digital video signal hi ^accordance with the inven- a Fint tosolvevaluc , D1 | k , andsecond dissolve value D2fk]. 
ticn. Dissolve detector 36 of video tradition appucation 22 ctlculated using the following formulas: 

may use this method to detect dissolve transitions. Dissolve & -© 

transitions move the viewer of a video signal between two M 16 . ^ 

camera shots, A and B. by averaging them over time. A Di\k\sl^ f^i)^^He^i\ Jfo 

weighted average is used which gradually reduces the inten- ^ V ^ m 

sity of shot A, and increases that of B. Experimentally / 33 16 \ (20) 

determined transition data included dissolves with lengths m{k] = \ jjfo Wckf,] + & Hct[l] ) ™ 

ranging from ten to one hundred two fields of an NTSC 

video signal with a modal value of sixteen and a median Other gain values could also be used in formulas 19 and 20. 
value of eighteen fields. Other samples had dissolves as long 35 Next, in step 114, the first and second dissolve signals, 

as one hundred eighty fields. comprised of the series of first and second dissolve values. 

The method begins in step 104 with initialization and m filtered to produce a first filtered dissolve signal and 

whatever preparation of the video signal is required. At the sccond mtcrcd signaL eac h comprising a plurality 

end of step 104. a video signal is ready to be processed by rf Wtered yalues each associatcd ^ a vidco 

video transition apptication 22. ^ step 106, a video image is ^ . rf ^ ^ and uxmA ^ is 

received by low level processor 26 Then, in step 1M, low ^ by convolviDg cach signal with a specific 

level processor 26 segments the video image ^to cells and J* > function. A difference of Gaussian* is 

computes the cell contrast and cell intensity vectors as , . , , 

described above. The cell contrast and cell intensity vectors used to av0ld d^ng a diwolye where a camera is paniung 

are output to mid-level processor 28. In step 110, mid-level 10 a scene or whcre a scene *** low contrast for a long period 

processor 28 computes a contrast change vector and maxi- 45 of time. FUtawmdow sizes of 120 and 240 video images are 

mum cell contrast value for the video image. These are used in ^ ernrx>diment. Filter window sizes may vary 

output to dissolve detector 36. depending upon the video frame rate and/or the signal 

In step 112. a first dissolve signal and a second dissolve scanning method. The first filtered value. dl[k] and second 

signal are generated. The first dissolve signal comprises a filtered value. d2[k]. for video image k can be calculated 

series of first dissolve values each associated with a video using the following formulas: 

rfim= f Dl[*-l]f— 1= C H^-_L e ^W>| (21) 



image while the second dissolve signal comprises a series of ^ It is then determined in step 116 whether a dissolve has 

second dissolve valucs. each associated with a video image. been detected or not A fast dissolve is detected when the 

The first dissolve signal is used to measure fast dissolves first dissolve signal is greater than a first threshold value (22 

(those dissolves taking from 16 to 60 fields to complete) in this embodiment). Dissolve detector 36 only generates a 

while the second dissolve signal is used to measure slow dissolve event, however, when the center of the dissolve is 

dissolves (those dissolves taking between 60 and 180 fields reached. The center of a fast dissolve is the position at which 

to complete). The first and second dissolve values for a video 65 the first filtered value for that image is greater than the 

image are computed using a contrast difference histogram. threshold and the first filtered value for image k exceeds Che 

He*. Histogram intervals of 0.5 units are used in this first filtered value for image k+1. Similarly, a slow dissolve 
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centered at image k is detected where the second filtered 
value is greater than a threshold (17 in this embodiment) and 
the second Altered value for image k is greater than the 
second filtered value for image k+L If either type of dissolve 
is detected, then an appropriate dissolve event is generated 
at step 118. If no dissolve was detected in step 116. the 
process continues in step 120 where it is determined whether 
the video image being processed is the last image. If not 
then the next image is processed starting at step 106. If so, 
then the procedure terminates at step 122. 

FIG. 8 illustrates a method for detecting the type of 
production media used to produce a digital video signal in 
accordance with the invention. Media detector 38 of video 
transition application 22 may use this method to identify the 
type of production media used for a particular video signal 
A useful index for broadcast video is an identification of the 
original production media. Specifically, one might desire to 
know whether the original production was videotaped or 
whether it was filmed. An NTSC broadcast signal that was 
videotaped has a sixty field per second (thirty frame per 
second) rate. However, cartoons may have a twelve or 
twenty-four frame per second rate while a film normally has 
a twenty-four frame per second rate. Ib translate a cartoon 
or a film into an NTSC broadcast signal* some frames are 
repeated. For a twenty-four frame per second film, the 25 
frames are sent in a 3-2 field presentation wherein a first 
frame of the film is broadcast during the first three fields of 
an NTSC broadcast signal and the second frame of the film 
is broadcast during the fourth and fifth fields of the NTSC 
broadcast signal This 3-2 pattern continues for each frame 
in the film. For a cartoon with a twelve frame per second 
rate, each frame is broadcast during five consecutive fidds 
of an NTSC broadcast signal 

Media detector 38 of video transition application 22 
identifies each video clip within a digital video signal as 
having been recorded either on film or on videotape. Media 
detector 38 employs a set of five finite-state machines whose 
outputs are combined to determine the presence and length 
of a twenty-four frame per second or twelve frame per 
second segment. More or less state machines could be used. 
Each finite-state machine is designed to synchronize to a 
twenty-four frame per second signal with a specific phase. 
These state machines will also synchronize with a twelve 
frame per second signal with a specific phase. Because the 
3-2 field presentation rate of a twenty-four frame per second 
film is fixed and periodic* it can be represented by a square 
wave with a period five fields long. The leading edge of this 
waveform with respect to the start of the clip is considered 
to be its phase. Individual finite-state machines process the 
digital video signal to detect the leading and trailing edges 
of frame changes and compare these positions to those 
matching their internal hard-coded phase pattern. A count of 
images matching the pattern is maintained and recorded by 
each state machine. 

Each state machine is a five-state automaton, counting 
images in the video stream that match its phase configura- 
tion. The phase of each state machine is said to match a 
given phase pattern as long as frame-to-frame changes occur 
at the appropriate image positions. While matching is 
successful the finite-state machines* output count is 
increased. When a match fails, the finite- state machine resets 
this count to zero. 

The process begins in step 124 with initialization and 
whatever preparation of the video signal is required. At the 
end of step 124. a digital video signal is ready to be 
processed by video transition application 22. In step 126. a 
video image is received by low level processor 26. In step 
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128, low level processor 26 segments the image into cells 
and computes the cell contrast and cell intensity vectors for 
the image. These vectors are output to mid-level processor 
28. Mid-level processor 28, in step 130, computes the 
inter- image similariry value, as discussed above. This value 
is output to media detector 38. 

In step 132, the run length for each of the five finite-state 
machines is updated in accordance with Table 2. The state of 
each finite-state machine is also changed in accordance with 
Table 2. 

TABLE 2 



Present 
State 


Next 
State 


Next Count Output 


A 


B 


m+1 


B 


C 


if S^oi^m+l else 0 


C 


D 


if SaT^ m+i ^ 0 


D 


E 


m+1 


E 


A 


if ^T^^^m+l ebe 0 



30 



35 



40 



45 



50 



55 



60 



65 



As described in Table 2, a threshold value (0.993 in this 
embodiment) is used to determine whether an image k is 
different from image k-1. When the inter-image similarity 
value S* is less than the threshold value, it is determined that 
a new image is present and a leading or trailing edge has 
been found. Otherwise, the previous image, or one virtually 
identical to It, is present and no edge is detected 

In step 134, the maximum run length is updated A 
combiner in media detector 38 compares the output counts 
of all five finite-state machines after the processing of each 
image and selects the maximum value as the current twenty- 
four frame per second run length. The combiner terminates 
a run as soon as the current run length is smaller than the 
previous run length. The previous run length is reported as 
the number of images found in the twenty-four frame per 
second video clip. 

In step 136, it is determined whether the maximum run 
length has exceeded 60 video images (60 fields in this 
einbodiment). A minimum run length of 60 video images 
causes media detector 38 to identify a particular video clip 
as a twenty-four frame per second clip. Another run length 
threshold could also be used. Video segments not identified 
as twenty-four frame per second dips are, by default, 
assumed to be thirty frame per second clips. Video clips that 
are twelve frame per second cartoons will be detected as 
twenty-four frame per second clips in this embodiment 
Alternatively, an additional state machine or state machines 
could be used to differentiate between twenty-four frame per 
second filmed video clips and twelve frame per second 
cartoon video clips. 

If the maximum run length did not exceed 60 in step 136, 
then it is determined in step 146 whether the image being 
processed is the last image in the video signal. If not, then 
the next image is processed beginning with step 126. If so, 
then the procedure terminates in step 148. 

If the maximum run length was greater than 60 in step 
136. then it is determined whether the maximum run length 
is greater than the previous maximum run length in step 140. 
If so, then the maximum run length is updated in step 142. 
If not then a film detect event is generated in step 138 with 
the previous maximum run length stored as the run length 
for the event. In step 144, the run length counters are reset 
If the maximum run length was updated in step 142. men a 
check is made in step 143 to determine whether the current 
image is the last image. If so, then a film detect event is 
generated in step 145 with the maximum run length counter 
stored as the run length. 
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Media detector 38 also generates an event in step 144 for 
event resolvcr 40. This event identifies the number of the 
image where a twenty-four frame per second clip began as 
well as the run length of that clip. 

Returning to FIG. 2, when any of the transitional markers 
are detected by one of the detectors 50-38. the events 
generated by the detectors 30-38 are sent to event resolver 
40. Event resolver 40 resolves multiple and overlapping 
events and generates a timeline identifying the transitional 
markers encountered within a particular digital video signal. 

Because the detectors 30-38 operate independently, it is 
possible for multiple or overlapping events to be indicated. 
For example, dissolve detector 36 can be triggered by fades 
of certain durations or by a cut to blanking. Also, dissolves 
whose length is in between the slow and fast averaging 
window can cause both events to be placed in the event 
stream if they are centered on different image numbers. Fast 
fades in the video signal can cause a cut event to be detected 
between the last faded image and the following blank field. 
Event resolver 40 removes ambiguities caused by these and 
other similar conditions. 

Event resolver 40 chooses which event to report when 
multiple, overlapping, or conflicting events are indicated. 
The technique used by event resolver 40 is to choose an 
event based upon a predetermined preferred event priority 
and a minimum separation distance required of each event 
Event priorities, from highest to lowest, and their required 
spacing are listed in Table 3. Minimum separation distances, 
measured in image counts (field counts in this embodiment) 
were experientially determined by examining event lengths 
and inter-event gaps found in sample broadcast video sig- 
nals. 



It should be understood that the invention is not limited to 
the illustrated embodiment and that a number of substitu- 
tions can be made without departing from the scope and 
teachings of the present invention. For example, although 
video transition application 22 is a software application, all 
or a portion of the functions performed by video transition 
application 22 could be performed in hardware. Also, 
although the present embodiment processes NTSC broad- 
cast video signals comprising a series of frames each com- 
prised of two fields, the invention could be used for any type 
of video signal. 

This embodiment of video transition application 22 per- 
forms segmentation on each field of an NTSC broadcast 
video signal. Alternatively, the methods of the invention 
could be used by segmenting only every other field, every 
other frame, etc. of the NTSC broadcast video image. 
Similarly, even if all fields or all frames are segmented, the 
various image features such as the cell contrast and intensity 
vectors might only be computed for a subset of the seg- 
mented video images. Also, a subset of the pixels within 
each cell might be used to compute the cell contrast and cell 
intensity values for that cell. Similarly, a subset of the cell 
intensity and cell contrast values for a video image might be 
used to form the cell contrast and cell intensity vectors. 
Certain thresholds were described herein. These thresholds 
25 could all vary from the thresholds described without depart- 
ing from the scope of the invention. Other substitutions are 
also possible and can be made without departing from the 
spirit and scope of the invention as defined by the appended 
claims. 
What is claimed is: 

1. A method of detecting a cut in a digital video signal 
made up of a series of video images, comprising: 



15 



20 



30 



TABLE 3 



Event Priorities and Minimum Separation Distances 

Minimum Event Separation Distance fin fields) 



Event Type 



Priority btok uaif radi fade cut dfct dslw 



BLANK-FIELD 


1 


1 


1 


8 


8 


15 


50 


100 


UNIFORMUTELD 


2 


* 


1 


8 


8 


15 


50 


100 


FADE_JN 


3 




* 


30 


30 


30 


65 


H5 


FADE_OUT 


3 


* 


• 


30 


30 


30 


65 


115 


CUT 


4 


* 


* 


• 


* 


10 


SO 


100 


DISSOLVE—FAST 


5 












100 


150 


DBSOLVE_SLOW 


6 




• 


* 


• 


• 


* 
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Event resolution proceeds as follows. The event stream is 
scanned to find the image position, k, of the next indicated so 
event. If the event stream for image k specifies more than 
one event, the highest priority event, E, is selected according 
to Table 3. Other indicated events at position k arc removed. 

Having found an event E. a second scan is initiated 
proceeding both forward and backward in the event stream 55 
beginning at images k+1 and k-1. If events of equal or lower 
priority are found in the scan window, these events are 
removed from the stream. The rninimum separation distance 
specified in the table, minus one. is the length of the 
scanning window for each equal or lower priority event 60 
which may be encountered. Higher priority events are not 
removed if they occur in the scanning window. 

Event resolver 40 produces both a video annotation report 
and a video timeline. These items are placed in an output file 
with the video annotation report occurring first. Frame-rate 65 
events are always reported and are not altered by event 
resolver 40. 



segmenting a plurality of the video images, each of the 
plurality of video images segmented into a first number 
of cells, each cell comprising a second number of 
pixels, a pixel having a pixel intensity value represent- 
ing the intensity of the pixel; 

generating a plurality of cell contrast vectors each asso- 
ciated with one of the segmented video images, each 
element of the cell contrast vector associated with one 
of the cells and comprising the standard deviation of 
the pixel intensity values for the pixels in that cell; 

generating a cut-detect signal for a video image in 
response to the cell contrast vector associated with that 
image. 

2. The method of claim 1. wherein each video image 
comprises a field of a video signal. 

3. The method of claim 1. wherein each video image 
comprises a frame of a video signal. 

4. The method of claim 1. further comprising: 
computing a plurality of inter-image similarity values 

each associated with one of the segmented video 
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images, the later-image similarity value far a video 
image comprising the cosine of the angle between the 
cell contrast vector for that video image and the ceil 
contrast vector for another video image; and 
wherein the step of generating a cut-detect signal further 5 
comprises generating a cut-detect signal for a video 
image in response to the inter-image similarity value 
associated with that image. 

5. The method of claim 4. further comprising: ]Q 
filtering the inter-image similarity values using a spike 

filter to produce a filtered spike size value for each of 
the segmented video images; and 
wherein the step of generating a cut-detect signal further 
comprises generating a cut-detect signal for a video 15 
image if the filtered spike size value for that video 
image exceeds a first threshold. 

6. The method of claim 4. wherein the inter-image simi- 
larity signal for a video image further comprises the cosine 

of the angle between the cell contrast vector for that video 20 
image and the cell contrast vector for the immediately 
previous video image of the digital video signal. 

7. The method of claim 5. further comprising: 
generating a cut-detect signal for a video image if the 2 s 

filtered spike size value for that video image exceeds a 
second threshold and the ratio of the filtered spike size 
value for that video image to the difference between 
one and the inter-image similarity value for that video 
image exceeds a third threshold. 30 

8. The method of claim 5. further comprising: 
generating a plurality of contrast change vectors each 

associated with one of the segmented video images, the 
contrast change vector for a video image comprising 
the cell contrast vector for that video image minus the 35 
cell contrast vector for another video image; 

computing a contrast difference value for each of the 
segmented video images, the contrast difference value 
for a video image comprising a count of all elements of 
the contrast change vector associated with that video 40 
image that are greater than a first contrast change value 
or less than a second contrast change value; and 

generating a cut-event signal for a video image if a 
cut-detect signal has been generated for that video 
image and the contrast difference value for that video 45 
image exceeds a fourth threshold. 

9. The method of claim 8. further comprising: 
generating a cut-detect signal for a video image if the 

filtered spike size value for that video image exceeds a x 
second threshold and the ratio of the filtered spike size 
value for that video image to the difference between 
one and the inter-image sinulariry value for that video 
image exceeds a third threshold. 

10. The method of claim 8. further comprising: 55 
generating a cut-detect signal for a video image if the 

contrast difference value for that video image exceeds 
a fifth threshold and the difference between the contrast 
difference value for that video image and the maximum 
of the contrast difference values for a plurality of other 50 
video images exceeds a sixth threshold. 

11. The method of claim 8. wherein each video image 
comprises a field of a video signal; and 

wherein the contrast change vector for a video image 
comprises die cell contrast vector for that video image 65 
minus the cell contrast vector for the same field in the 
immediately prior frame of the digital video signal. 



12. A computerized system for detecting a cut in a digital 
video signal made up of a series of video images, compris- 
ing: 

a computer-readable medium; and 
a computer program encoded on the computer-readable 
medium, the computer program further operable to 
segment a plurality of the video images, each of the 
plurality of video images segmented into a first 
number of cells, each cell comprising a second 
number of pixels, a pixel having a pixel intensity 
value representing the intensity of the pixel; 
generate a plurality of cell contrast vectors each asso- 
ciated with one of the segmented video images, each 
element of the cell contrast vector associated with 
one of the cells and comprising the standard devia- 
tion of the pixel intensity values for the pixels in that 
cell; and 

generate a cut-detect signal for a video image in 
response to the cell contrast vector associated with 
that image. 

13. The computerized system of claim 12 wherein the 
computer program is further operable to: 

compute a plurality of inter-image similarity values each 
. associated with one of the segmented video images, the 
inter-image similarity value for a video image com- 
prising the cosine of the angle between the cell contrast 
vector for that video image and the cell contrast vector 
for another video image; and 

wherein the computer program generates a cut-detect 
signal for a video image in response to the inter-image 
similarity value associated with that image. 

14. The computerized system of claim 13 wherein the 
computer program is further operable to: 

filter the inter-image similarity values using a spike filter 

to produce a filtered spike size value for each of the 

segmented video images; and 
generate a cut-detect signal for a video image if the 

filtered spike size value for that video image exceeds a 

first threshold. 

15. The computerized system of claim 14 wherein the 
computer program is further operable to: 

generate a cut-detect signal for a video image if the 
filtered spike size value for mat video image exceeds a 
second threshold and the ratio of the filtered spike size 
value for that video image to the difference between 
one and the inter-image similarity value for that video 
image exceeds a third threshold. 

16. The computerized system of claim 14 wherein the 
computer program is further operable to: 

generate a plurality of contrast change vectors each asso- 
ciated with one of the segmented video images, the 
contrast change vector for a video image comprising 
the cell contrast vector for that video image minus the 
cell contrast vector for another video image; 

compute a contrast difference value for each of the 
segmented video images, the contrast difference value 
for a video image comprising a count of all elements of 
the contrast change vector associated with mat video 
image that are greater than a first contrast change value 
or less than a second contrast change value; and 

generate a cut-event signal for a video image if a cut- 
detect signal has been generated for that video image 
and the contrast difference value for that video image 
exceeds a fourth threshold 
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17. The computerized system of claim 14 wherein the 
computer program is further operable to: 

generate a cut-detect signal for a video image if the 
filtered spike size value for that video image exceeds a 
second threshold and the ratio of the filtered spike size 5 
value for that video image to the difference between 
one and the inter-image similarity value for that video 
image exceeds a third threshold. 

18. The computerized system of claim 14 wherein the 
computer program is further operable to: 

generate a cut-detect signal for a video image if the 
contrast difference value for that video image exceeds 
a fifth threshold and the difference between the contrast 
difference value for that video image and the maximum 
of the contrast difference values for a plurality of other 
video images exceeds a sixth threshold. 

19. A computerized system for detecting a cut In a digital 
video signal made up of a series of video images, compris- 
ing: 

a storage medium; 

a processor coupled to the storage medium; 
a digital video signal source coupled to the storage 

medium and providing the digital video signal to the 

system; 

a computer program stored in the storage medium, the 
computer program operable to run on the processor and 
process the digital video signal, the computer program 
further operable to 

segment a plurality of the video images, each of the 
plurality of video images segmented into a first 
number of cells, each cell comprising a second 
number of pixels, a pixel having a pixel intensity 
value representing the intensity of the pixel; 

generate a plurality of cell contrast vectors each asso- 
ciated with one of the segmented video images, each 
element of the cell contrast vector associated with 
one of the cells and comprising the standard devia- 
tion of the pixel intensity values for the pixels in that 
cell; 

generate a cut-detect signal for a video image in 
response to the cell contrast vector associated with 
that image. 

20. The computerized system of claim 19 wherein the 
computer program is further operable to: 

compute a plurality of inter-image similarity values each 
associated with one of the segmented videoimages, the 
inter-image similarity value for a video image com- 
prising the cosine of the angle between the cell contrast 
vector for that video image and the cell contrast vector 
for another video image; and 

wherein the computer program generates a cut-detect 
signal for a video image in response to the inter-image 
similarity value associated with that image. 

21. The computerized system of daim 20 wherein the 
computer program is further operable to: 

filter the inter-image similarity values using a spike filter 

to produce a filtered spike size value for each of the 

segmented video images; and 
generate a cut-detect signal for a video image if the 

filtered spike size value for that video image exceeds a 

first threshold. 

22. The computerized system of claim 20 wherein the 
computer program is further operable to: 

generate a cut-detect signal for a video image if the 
filtered spike size value for that video image exceeds a 
second threshold and the ratio of the filtered spike size 
value for that video image to the difference between 
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one and the inter-image similarity value for that video 
image exceeds a third threshold. 

23. The computerized system of claim 20 wherein the 
computer program is further operable to 

generate a plurality of contrast change vectors each asso- 
ciated with one of the segmented video images, the 
contrast change vector for a video image comprising 
the cell contrast vector for that video image minus the 
cell contrast vector for another video image; 

compute a contrast difference value for each of the 
segmented video images, the contrast difference value 
for a video image comprising a count of all elements of 
the contrast change vector associated with that video 
image that are greater than a first contrast change value 
or less than a second contrast change value; and 

generate a cut-event signal for a video image if a cut- 
detect signal has been generated for that video image 
and the contrast difference value for that video Image 
exceeds a fourth threshold. 

24. A method of detecting a cut in a digital video signal 
made up of a series of video images, comprising: 

segmenting a plurality of the video images, each of the 
plurality of video images segmented into a first number 
of cells, each cell comprising a second number of 
pixels, a pixel having a pixel intensity value represent- 
ing the intensity of the pixel; 

generating a plurality of cell intensity vectors each asso- 
ciated with one of the segmented video images, each 
element of the cell intensity vector associated with one 
of the cells and proportional to the average of the pixel 
intensity values for the pixels in that cell; 

generating a cut-detect signal for a video image in 
response to the cell intensity vector associated with that 
image. 

25. The method of claim 24, further comprising: 
computing a plurality of inter-image similarity values 

each associated with one of the segmented video 
images, the inter-image similarity value for a video 
image comprising the cosine of the angle between the 
cell intensity vector for that video image and the cell 
intensity vector for another video image; and 
wherein the step of generating a cut-detect signal further 
comprises generating a cut-detect signal for a video 
image in response to the inter-image similarity value 
associated with that image. 

26. The method of claim 25. further comprising: 
filtering the inter-image similarity values using a spike 

filter to produce a filtered spike size value for each of 
the segmented video images; and 
wherein the step of generating a cut-detect signal further 
comprises generating a cut-detect signal for a video 
image if the filtered spike size value for that video 
image exceeds a first threshold. 

27. The method of claim 25. wherein the inter-image 
similarity value for a video image further comprises the 
cosine of the angle between the cell intensity vector for that 
video image and the cell intensity vector for the immediately 
previous video image of the digital video signal. 

28. The method of claim 26. further comprising: 
generating a cut-detect signal for a video image if the 

filtered spike size value for that video image exceeds a 
second threshold and the ratio of the filtered spike size 
value for that video image to the difference between 
one and the inter-image similarity value for that video 
image exceeds a third threshold. 

* # # * + 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



03/11/2004, EAST Version: 1.4.1 



